Learning Complex Neural Network Policies with Trajectory Optimization

نویسندگان

  • Sergey Levine
  • Vladlen Koltun
چکیده

Direct policy search methods offer the promise of automatically learning controllers for complex, high-dimensional tasks. However, prior applications of policy search often required specialized, low-dimensional policy classes, limiting their generality. In this work, we introduce a policy search algorithm that can directly learn high-dimensional, general-purpose policies, represented by neural networks. We formulate the policy search problem as an optimization over trajectory distributions, alternating between optimizing the policy to match the trajectories, and optimizing the trajectories to match the policy and minimize expected cost. Our method can learn policies for complex tasks such as bipedal push recovery and walking on uneven terrain, while outperforming prior methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Guided Policy Search

Direct policy search can effectively scale to high-dimensional systems, but complex policies with hundreds of parameters often present a challenge for such methods, requiring numerous samples and often falling into poor local optima. We present a guided policy search algorithm that uses trajectory optimization to direct policy learning and avoid poor local optima. We show how differential dynam...

متن کامل

Learning Dynamic Manipulation Skills under Unknown Dynamics with Guided Policy Search

Planning and trajectory optimization can readily be used for kinematic control of robotic manipulation. However, planning dynamic motor skills requires a detailed physical simulation, and some aspects of the task, such as contacts, are very difficult to simulate with enough accuracy for dynamic manipulation. Alternatively, manipulation skills can be learned from experience, allowing them to def...

متن کامل

Learning a Structured Neural Network Policy for a Hopping Task

In this work, we attempt to learn a neural network policy for dynamic, underactuated locomotion tasks. Learning a policy for such a task is non trivial due to the dynamic, fast changing, non linear and contact rich dynamics of this task. We use existing trajectory optimization techniques to optimize a set of policies. For this, we present a method that allows to learn contact rich dynamics for ...

متن کامل

Deep Learning Quadcopter Control via Risk-Aware Active Learning

Modern optimization-based approaches to control increasingly allow automatic generation of complex behavior from only a model and an objective. Recent years has seen growing interest in fast solvers to also allow real-time operation on robots, but the computational cost of such trajectory optimization remains prohibitive for many applications. In this paper we examine a novel deep neural networ...

متن کامل

Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics

We present a policy search method that uses iteratively refitted local linear models to optimize trajectory distributions for large, continuous problems. These trajectory distributions can be used within the framework of guided policy search to learn policies with an arbitrary parameterization. Our method fits time-varying linear dynamics models to speed up learning, but does not rely on learni...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014